Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Adding text vectorization #178

Closed
wants to merge 4 commits into from
Closed

Adding text vectorization #178

wants to merge 4 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 25, 2017

This is oriented to #177 . It adds the frequency vectorizer and a simple tokenizer

@ghost
Copy link
Author

ghost commented Apr 30, 2017

@AtheMathmo any concerns?

@AtheMathmo
Copy link
Owner

AtheMathmo commented May 3, 2017

Sorry for the delay in reviewing this, it's a particularly busy time for me right now...

At a cursory glance most the details here look good. I'm a little concerned about the addition of the Vectorizer trait. I feel that a modification to the Transformer trait might be more appropriate. But honestly I'd need to review the code properly to be sure.

I'll try to get around to this by the end of the week. Please do poke me again if I haven't given any feedback by then.

Edit:

Also, thank you for taking the time to do this. I think that overall it is a really cool feature!

@ghost
Copy link
Author

ghost commented May 3, 2017

AtheMathmo thanks for your answer given that you are busy right now!
If you want I can modify the Transformer trait and replacing all the implementations of it with the correct generics. I have a few concerns about this, the most important is about the fact that loading strings with the text as owned stuff. It would be better to load it from slices to save unnecessary allocations.

@ghost ghost closed this Jun 13, 2017
@k0pernicus
Copy link

k0pernicus commented Nov 24, 2017

It seems that @ghost has disappeared, and the PR has been closed (no merged) - sad :-(
Is a new text vectorization feature has been accepted since this PR @AtheMathmo ?

@zackmdavis
Copy link
Collaborator

@k0pernicus The deactivation of the author's GitHub account is probably unrelated to this project and therefore doesn't necessarily stop us from using his patches ...

(Rusty-machine has been very quiet lately—albeit crucially, not entirely dead!—while Athe has been busy with grad school and I've only managed to pitch in some review effort with spare cycles here and there; so text vectorization is definitely on the table, just without any timeline promises.)

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants